A Study of Prototype Selection Algorithms for Nearest Neighbour in Class-Imbalanced Problems

نویسندگان

  • Jose J. Valero-Mas
  • Jorge Calvo-Zaragoza
  • Juan Ramón Rico-Juan
  • José Manuel Iñesta Quereda
چکیده

Prototype Selection methods aim at improving the efficiency of the Nearest Neighbour classifier by selecting a set of representative examples of the training set. These techniques have been studied in situations in which the classes at issue are balanced, which is not representative of real-world data. Since class imbalance affects the classification performance, data-level balancing approaches that artificially create or remove data from the set have been proposed. In this work, we study the performance of a set of prototype selection algorithms in imbalanced and algorithmically-balanced contexts using data-driven approaches. Results show that the initial class balance remarkably influences the overall performance of prototype selection, being generally the best performances found when data is algorithmically balanced before the selection stage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It ...

متن کامل

Diagnose Effective Evolutionary Prototype Selection Using an Overlapping Measure

Evolutionary prototype selection has shown its effectiveness in the past in the prototype selection domain. It improves in most of the cases the results offered by classical prototype selection algorithms but its computational cost is expensive. In this paper, we analyze the behavior of the evolutionary prototype selection strategy, considering a complexity measure for classification problems b...

متن کامل

Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data

The Modified Condensed Nearest Neighbour (MCNN) algorithm for prototype selection is order-independent, unlike the Condensed Nearest Neighbour (CNN) algorithm. Though MCNN gives better performance, the time requirement is much higher than for CNN. To mitigate this, we propose a distributed approach called Parallel MCNN (pMCNN) which cuts down the time drastically while maintaining good performa...

متن کامل

Instance Selection for Class Imbalanced Problems by Means of Selecting Instances More than Once

Although many more complex learning algorithms exist, knearest neighbor (k-NN) is still one of the most successful classifiers in real-world applications. One of the ways of scaling up the k-nearest neighbors classifier to deal with huge datasets is instance selection. Due to the constantly growing amount of data in almost any pattern recognition task, we need more efficient instance selection ...

متن کامل

Personal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)

Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017